A potential outcomes approach to defining and estimating gestational age-specific exposure effects during pregnancy

Many studies seek to evaluate the effects of potentially harmful pregnancy exposures during specific gestational periods. We consider an observational pregnancy cohort where pregnant individuals can initiate medication usage or become exposed to a drug at various times during their pregnancy. An important statistical challenge involves how to define and estimate exposure effects when pregnancy loss or delivery can occur over time. Without proper consideration, the results of standard analysis may be vulnerable to selection bias, immortal time-bias, and time-dependent confounding. In this study, we apply the “target trials” framework of Hernán and Robins in order to define effects based on the counterfactual approach often used in causal inference. This effect is defined relative to a hypothetical randomized trial of timed pregnancy exposures where delivery may precede and thus potentially interrupt exposure initiation. We describe specific implementations of inverse probability weighting, G-computation, and Targeted Maximum Likelihood Estimation to estimate the effects of interest. We demonstrate the performance of all estimators using simulated data and show that a standard implementation of inverse probability weighting is biased. We then apply our proposed methods to a pharmacoepidemiology study to evaluate the potentially time-dependent effect of exposure to inhaled corticosteroids on birthweight in pregnant people with mild asthma.


The sustained treatment effect parameter
In the main manuscript, we defined the intent-to-treat parameter to represent the effect of initiating treatment at a given point during a pregnancy where the pregnancy duration is allowed to be random and potentially precede the planned initiation. The sustained treatment effect is the effect of initiating and remaining on treatment from the given time point until the end of pregnancy.

Observed data
The observational data collected for each of n participants are independent and identically distributed (i.i.d.) and of the form (W (t), A(t), D(t); t = 1, ..., K, Y ). The full description is given in the main text, but briefly, W (t) represents the covariates, A(t) is the exposure, and D(t) is the delivery status at time t. Time K is the first time point at which all subjects have delivered (i.e. D(K) = 1 for all but D(K − 1) = 0 for some). Let T D represent the observed time of delivery. Let an overbar refer to a history of a variable up to the indicated time point, e.g. A(k) = {A(0), ..., A(k)} if k > 0 and A(0) = A(0). Y is the outcome measured at delivery. 1 We define the indicator σ k (t) of sustaining strategy k up to and including time t as which indicates whether a participant initiated treatment at time k and continued treatment until time t or were following this strategy until they delivered.

Sustained treatment strategy parameter and identifiability assumptions
Now we define the effect of initiating and sustaining treatment at time k until interrupted by delivery. The potential outcome under the sustained strategy Y σ k is the outcome that a participant would have had had they persisted in taking the assigned treatment from time k to K until interrupted by delivery. The parameter of interest is thus E(Y σ k ) for each starting time k, allowing us to make contrasts between alternative start times.
In order to estimate the sustained treatment strategy effect with observational data, we require similar assumptions to that of the ITT parameter. Consistency here means that Y = Y σ k if σ k (K) = 1. As before, if a participant has not yet initiated treatment prior to a delivery time T D < k, then their observed outcome is assumed to be equal to the counterfactual Y σ k for any k ≥ T D . Positivity here means that, conditional on the measured covariates (including delivery status) at a time point, all subjects would have a non-zero probability of continuing to follow any sustained treatment strategy at each time point. By construction, once a delivery occurs at T D , the subject has a probability of one of continuing to follow all strategies k for which σ k (T D ) = 1. Thirdly, we require a stronger type of exchangeability, that all baseline and time-dependent confounders of the exposure and outcome have been measured, This is a stronger assumption as we must measure the confounders of treatment taken at each time point rather than just the confounders of initiating treatment, amongst those who have not yet delivered. Non-interference is required as before.
2 More details about the estimation of the ITT parameter

G-computation
Firstly, by construction, no deliveries have occurred at the first time point (D(0) = 0) and all deliveries have occurred by the final time point (D(K) = 1). We then need to trivially modify our representation of the data to include the outcome at every time point. If the measured outcome is Y , we define a time-varying Y to be unknown (N A) up until the delivery, after which it remains the same. Thus we initialize Y (K + 1) = Y and define Y (t) = {Y (t + 1) if D(t − 1) = 1 or N A otherwise}, t = K, ..., 2, so that the complete data structure is Initializing Q K+1 = Y , the estimator for treatment initiation can be defined through the nested expectations We decompose the Q t expectations in order to develop reasonable modeling strategies. We note that The above equation shows that if delivery has occurred by t − 1, the outcome Y = Y (t + 1) is in the history and included in the conditioning statement, so the (nested) expectation of the outcome is equal to the outcome. If delivery has not yet occurred, then we need to model the expectation. For instance, we may model the expectations in the equation 1 by regressing the predictions Q t+1 on covariate and treatment history amongst those who have not yet delivered at time t − 1. We obtain the estimates of Q t by then taking the predictions from the regression model fit and setting A(k − 1) = 0 and, if k < t, also setting A(k) = 1, for all subjects who have not yet delivered. For those who have delivered by t − 1, their estimate of Q t is Y . 3 Estimation of the sustained treatment effect parameter

IPW
The probabilities of following the treatment strategy to initiate and sustain treatment starting at time k = 1, ..., K or never (k = K + 1) unless delivered are as follows: Once these probabilities are estimated, the IPW calculation for the effect of sustained treatment from time k involves running an intercept-only linear regression for the outcome with weights w k σ,n (K) equal to estimates of w k σ (K) where The estimated intercept from the resulting model fit is the IPW estimate of the parameter E(Y σ k ).
For the sustained treatment effect, we define Q The simplifications are the same as for the ITT setting, resulting in where we take A t (k) = (A(k), ..., A(t)) to indicate treatments from time k to t when k ≤ t.

TMLE
The TMLE procedure for the sustained treatment effect follows essentially the same steps as the procedure for the ITT parameter.